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1. INTRODUCTION 

Collaboration and competitiveness among businesses necessitate significant data exchanges, which 
are required for interoperability. Using ontologies, the companies describe the meaning of their data. The 
challenge that makes interoperability difficult is the lack of coherence between ontologies. In recent years, 
different tools and methods have been proposed for the reconciliation of the different ontologies. The 
evaluation of the degree of similarity between their concepts is the focus of most existing methods. In the 
literature, semantic similarity methods are classified into two main categories, single ontology similarity 
methods and cross ontology similarity methods. 

First, consider a single ontology's semantic similarity measures [1]. They can be summarized as 
shown in: i) Ontology-based approaches [2]-[11] take into account the ontology's path length and depth, 
when applied to large-scale ontologies, the disadvantage of this approaches is that many inheritance 
relationships are ignored. Other factors that influence semantic similarity are not considered into account in 
this strategy. ii) Information content-based approaches [12]-[17] quantify the amount of information a 
concept expresses in the taxonomy by computing similarity in terms of the shortest path between target concepts 
in the taxonomy. This approach has the disadvantage of being greatly influenced by corpus. iii) The feature- 
based approach [18]-[20] considers the features that are shared by two concepts as well as the distinguishing 
features that are unique to each concept. This method has the advantage of being able to solve the problem of 
semantic similarity across ontologies. The disadvantage is that it is better suited to processing large ontologies 
with extensive semantic knowledge than small ontologies. iv) The hybrid-based approach combines different 
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sources of information to measure the level of similarity between concepts. Attribute similarity, ontology 
structure, information content, and node depth are all factors considered in these methods [21]. 

According to Elavarasi et al. [22], the main advantage of these approaches is that if the knowledge 
of an information source is inadequate, it may be derived from the alternate information sources. Which 
generally improves the quality of the similarity measure. Some representatives of this approach are [23]-[26]. 
The hybrid approach considers more factors than the single approach, but it mostly relies on expert 
experience and adopts the method of manual weight assignment to formulate the weight factors of each 
element. 

Secondly, semantic similarity cross ontology. Elavarasi et al. [22] and Saruladha et al. [27] have 
developed measurements that compute similarity among concepts in different ontologies. The reason for this 
is the growing number of information sources on the web, which makes it difficult to evaluate the similarity 
between concepts. Furthermore, cross ontology and measurement match the words or concepts from different 
ontologies. Cross ontology often needs a hybrid or feature-based approach. This is due to the fact that the 
structure and information content of different ontologies cannot be directly compared [22]. The similarity 
measures between the concepts of different ontologies are classified in two main classes path length measure 
and feature based measure. 

The first one is the approach path length based Information, which is similar to approach, used in the 
semantic similarity measures for single ontology. Al-Mubaid and Nguyen [28] Within the framework of the 
Unified Medical Language System, proposes a new ontology-structure-based technique for measuring 
semantic similarity in a single ontology and across ontologies in the biomedical domain (UMLS). To solve 
the problems that many existing semantic similarity measures that use on ontology structure as their primary 
source witch cannot measure semantic similarity between terms and concepts when multiple ontologies are 
used. His evaluation is based on three features: i) a new feature of common specificity of concepts in the 
ontology; ii) local granularity of ontology clusters; and iii) cross-modified path length between two concepts. 

The second one is an approach based on features based on terms of information [29]. Tversky [30] 
propose a set-theoretical approach to similarity, where objects are represented as collections of features and 
similarity is described as a feature matching process. He demonstrates that the contrast model, which 
expresses similarity between items as a linear combination of measures of their common and distinctive 
features, is based on a set of qualitative assumptions. The author describes a method for computing semantic 
similarity that eliminates the need for a single ontology and takes into account differences in the levels of 
explicitness and formalization of distinct ontology specifications [31]. Using a matching process across 
synonym sets, semantic neighborhoods, and distinguishing features that are divided into parts, functions, and 
attributes, a similarity function determines related entity classes. Propose a method for computing semantic 
similarity that involves mapping concepts to ontologies and evaluating their links within those ontologies 
[32]. He investigated at methods for computing semantic similarity between natural language terms 
(using WordNet as the underlying reference ontology) and medical terms (using the MeSH ontology of 
medical andbiomedical terms). His research also focuses on cross ontology approaches, which can compute 
the semantic similarity between concepts from different ontologies. An adaptive e-learning system with cross 
ontology similarity measure has been developed with automatically generated concept map as an application 
to the e-learning system [33]. The major feature of this method is the user personalization model, which 
assesses the student capability of learning. The other attraction of this method is the fact that it uses multiple 
ontologies for the evolution of the concept from a particular domain. Propose a hybrid approach for 
measuring semantic similarity between ontologies based on WordNet, denoted by WNOntoSim [34]. The 
semantic similarity across ontologies at the elemental level is calculated using WordNet. He compute 
semantic similarity between ontologies at the structural level by creating contexts of nodes in which the 
structure of the ontology is encoded, and then combining these scores to get a full semantic similarity 
between ontologies. The most disadvantage of WNOntoSim is that it cann’t accurately compute semantic 
similarity between named entities because of coverage problem of WordNet. He presents a general approach 
for assessing similarity across multiple ontologies [35]. His strategies (the first focusing on high scalability 
and the second on high accuracy) aim to find an LCS that accurately represents the commonalities between 
terms among multiple ontologies. 

His approach is based on evidence found in background ontologies, both explicit (semantic) and 
implicit (structural). In this paper, we propose an original approach for computing semantic similarity 
between different ontologies. First, we merged these different ontologies in order to have a path between 
them and then we computed the weight similarity between concepts that are in the form of sentences using 
WordNet and finally we hybridized node-based approach such as WuP and Reda and Al with the weight of 
similarity computed before. This combination was necessary to integrate and reinforce the semantic factor. 
This paper is organized as shown in: Section 2 deals with the architecture of the proposed system. Section 3 
experiments results and evaluation measures. Conclusion and perspective are given in section 4. 
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2. 


THE PROPOSED APPROACH 

Our approach is composed of four phases, which are: 
The preprocessing. 
Computing the similarity measure of Wu and Palmer and Reda and Al. 
Computing the Hybrid Similarity measure of WWP and WRA. 
Evaluation of our approach using the Cohesion and Density method. 


Each phase is composed of several steps, which are explained in this section see Figure 1. 


Preprocessing 


The choice of Extraction : OWL = Merger 
Input: ontologies language - Concepts OWL: & OWL: 


OWL, OWL - English - Relations - Concepts 


- French - Relations 


{Concept Relations) 


Lemmatization (Relevant Terms) 
Matrix of incidence (0/1) of OWL 


Matrix of Shortest path matrix between (Ci,cj) of OWLS 
Distance between the concept and the Root 
Weight of the link between two concepts =1 


Compute similarities 


WuP and Reda & 
Semantic Measure al Measures 


Compute Hybrid Similarities 
Measure Measure 
wwe WRA 
Processus de Validation 


Experimental Evaluation 
Results Measures 


Figure 1. The proposed approach 
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2.1. Input OWL1 and OWL2 (dataset benchmark selection) 
For the English dataset, benchmark, we used a collection of ontologies describing the domain of 
conference organization found in the OAEI [36] organizes evaluation campaigns aiming at evaluating 
ontology-matching technologies. These ontologies describe the conference domain. We justify the choice of 
this data set by the following points: i) the most well-known evaluation campaign for testing the performance 
of ontology matching systems is the OAEI campaign. ii) In our approach, we need to apply similarity criteria 
to different ontologies but from the same domain, which is possible with this dataset. 


2.2. Preprocessing 
Firstly, in order to carry out our approach, we load two ontologies from a dataset file; this file 
contains the same ontologies but in different languages, after the system extracts all constructors from the 
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web ontology language (OWL) ontology file (concepts and relations between each concept in the same 
ontology). Secondly, we merge two ontologies selected in the previous step in order to find paths connecting 
between the concepts of the first ontology with a concept of the second ontology using protégé 2000. Protégé 
2000 is one of the greatest ontology management software programs currently available. This application's 
performance is attributable to the efficiency of its integrated tools, such as PROMPT Suite [37]. It is 
constituted of a set of tools that are useful for merging and mapping ontologies. One of the PROMPT tools is 
iPROMPT, which performs basic ontology merging operations. The algorithm's first stage requires two 
ontologies as input and returns a list of first suggestions for matches based on the concept names' lexical 
equivalence [38]. The algorithm then moves on to the next step, in which users do an action of their 
choosing. 

This operation is a task of the algorithm, which is done after human intervention. The choice of 
operation is made by selecting one of the suggestions or by specifying the required operation using the 
ontology-editing environment. The next step of iPROMPT automatically executes the modifications 
according to the previously selected operation. Then, iPROMPT generates again a list of suggestions based 
on the structure of the ontology, the inconsistencies and problems resolved after execution of the operation. 
Finally, iPROMPT proposes solutions for these problems and generates the merger ontology. 

However, the iPROMPT tools have some limitations: 

- The semi-automation of the merger algorithm. 

- The iPROMPT considers the structure of the ontology, but does not consider the treatment of the 
relations between the concepts and the pertinence of concepts. 

Figure 2 represente screenshot of the graph node of OWL] (the red one), OWL2 (the blue one) and OWL3 

which represents the merging of OWL1 and OWL2 (red and blue). 
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Figure 2. Screenshot of the graph node of OWL1, OWL2 and OWL3 
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2.3. Lemmatization 

After analyzing the concepts of the two ontologies, we notice that each concept is a sentence (set of 
words). Before applying the similarity measure, we used Tree Tagger [39] to lemmatize all of the concepts in 
order to keep only the relevant words of each concept, as shown in Figure 3. The type of relevant words 
chosen is noun, adjective and verb. 
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Each concept is a sentence and The relevant words of each 
each sentence a set of words sentence 
Figure 3. Preprocessing on concepts 


The TreeTagger is a tool for annotating text with information about parts of speech and lemma. 
Schmid [39] created it as part of the TC project at the Institute for Computational Linguistics at the 
University of Stuttgart. German, English, French, Italian, Danish, Swedish, Norwegian, Dutch, Spanish, 
Bulgarian, Russian, Portuguese, Galician, Greek, Chinese, Swahili, Slovak, Slovenian, Latin, Estonian, 
Polish, Persian, Romanian, Czech, Coptic, and old French texts in English have all been successfully tagged 
with TheTreeTagger. In our approach, we have used this tool for the English and French language. We have 
chosen relevant terms in our approach (Noun, Adjective, Verb), to compute the weights of similarity between 
concepts (Ci, Cj) using Algorithm 1. 


Algorithm 1. Lemmatization using TreeTagger 

Input: Set of concepts Ci// knowing that each concept is a sentence and each sentence a set 
of words. 

Output: Vit : vector containing relevant terms for each concept 

Method: 

Vit FO 

For each C; do 

begin 

Let VtorenVector of relevant terms T; of the conceptC; 

For each terms T; of the vector Vtoken do 


begin 
Lemmatization (Vtoken (Ti) ) 
Let token€ term Ty 
let pos€ type Ti 
let lemma€ original Ti 
ifpos€{ Noun, Adjective, Verb } 
Vit CVU lemma 
end 
returnVrt 
end 


2.4. Computing matrixes 

In this step, we compute the incidence matrix on OWL3 by setting "0" if there is not a pathbetween 
two concepts (Ci, Cj), otherwise set "1". According to this matrix, we apply the Dijkstra algorithm; it is one 
of the most widely used algorithms for finding the shortest paths from a particular source node to any other 
node. Where the weight of the link is equal to "1". We all compute the distance between a concept and the 
root (root R). The reason for this is that some similarity measures used in this work are based on the 
computation of distances that separate the desired nodes from the root node R and the distance that separates 
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the subsuming concept (CS) from these nodes. An example is the Wu and Palmer measure. This phase 
generates a minimal distance between a concept and its root node R, see Algorithm 2. 


Algorithm 2. For finding the shortest path between the concepts 

Input: - Max: number of all concepts in the merging ontology OWL3, 

- Mat: integer matrix of size (max) x (max), 

- MI: it is a matrix of link between the concepts, equal to 1 if there is a link otherwise 
-1 

Output: CS: matrix of all paths from each start (d) to each arrived (a) 

Method: 

fori=0 : Max do 

for j=0 : Max do 


if (MI[i,j] == -1) 

Mat[i,j] © 100000 

d¢ 0 

while (d< Number of concepts) do 
begin 


Dijkstrada // da is instance of Dijkstra 
da set of all paths from start d to arrive a in Mat matrix 
fora =1 : Maxdo 
begin 
Let V a vector of all way from d to arrive toa 
Let N be an integer vector of size V 
patha®& @ //path is a set of node where d is the first node and a is the last node 
initialized at empty set 
fori =0 : sizeof(N) do 
begin 
N[i] €Get(V (i)) // receive value of V[i] 
patha&pathau {N[i]} 
end 
storage patha in the CS matrix at line d and column a 
end 
d€d+1 
end while 


2.5. Similarity measure computing 

In order to show the effectiveness of our approach, we have implemented two measures Wu and 
Palmer [5] and Rada et al. [3]. This is in order to make a comparison. This evaluation is discussed in 
section 3.1. 


2.5.1. Wu and palmer 

The edge counting method proposed by Wu and Palmer [5] is defined as shown in: OWL3 is 
composed of a set of nodes and a root node (R) as shown in Figure 4. C1 and C2 are two concepts of the 
ontology for which we will compute similarity. The distances (N1 and N2) between nodes C1 and C2 and the 
root node, as well as the distance (N) between the closest common ancestor (CS) of Cl and C2 from the node 
R, are used to compute similarity. The similarity measure proposed by Wu and Palmer [5] is defined as (1). 


2*N 


Simyy (Cy, C2) = Ny +p (1) 


Figure 4. Example of a concept hierarchy 


The issue with this measure is that in ontology, arcs represent equal distances, meaning that all 
semantic links have the same weight. After analyzing a comparison between similarity measurement methods 
this comparison shows that, the Wu and Palmer present the advantage of being simple to compute, while 


Indonesian J Elec Eng & Comp Sci, Vol. 26, No. 1, April 2022: 493-504 


Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 m) 499 


remaining as expressive as the others do. This is the reason that influenced us to adopt this measure as the 
basis of our hybrid approach. 


2.5.2. Reda and Al 

In ontology, the number of minimal arcs that separate two concepts must be computed to estimate 
their similarity. To discover the shortest path between two concepts, this measure uses the edge-counting 
method. The (2) defines the Rada et al. [3] measure: 


1 


Simpa(Cy Cz) = 1+dist(C1,C2) 


(2) 


hence, dist (C1, C2) corresponds to the number of arcs that must be traversed in the ontology to connect the 
concepts C1 and C2. 


2.6. The weight similarity matrix 

The objective of this step is to compute the similarity weight between each concept of OWL with 
the concepts of OWL2. Knowing that the concepts of our two ontologies are in the form of sentences, in 
order to calculate the weight similarity between these concepts, we use the lemmatization step, which was 
applied previously. This allowed us to obtain a set of relevant terms for every concept, in order to compute 
first the weight of similarity between the word pairs before going to the phrase. 

We have used WorldNet. WorldNet [40] is a free English electronic dictionary created by cognitive 
scientists at Princeton University, directed by Miller. There are 150,000 words in the WorldNet 2.0, 
organized into 115,000 synonym sets. There is a total of 207,000 words sense groups. Each synonym 
represents a fundamental semantic concept and is linked with lexical relations and conceptual-semantic. 
We use WorldNet to compute the weight of similarity between two words. The value obtained will be used in 
the (3) to compute the weight of similarity between concepts. After we use the formula [41] to calculate the 
similarity between two concepts, the first concept comes from OWL, the second from OWL2. 


Sim(Co1,Coo) = (3) 


2n 2m 


The weight of similarity between the concepts allows to build the following matrix as shown in Figure 5, 
which represents the semantic similarity between these two ontologies. 


OWL 2 


Computation of similarity 
for each two concepts 


browsing OWL 3 


Figure 5. Matrix of the weight of similarity 


2.7. Computing hybrid matrix 

In this step, the incidence matrix is updated by computing the union of the incidence matrix 
computed in section 2.4 with the matrix of the weight of similarity computed in section 2.6. After, the 
algorithm is modified to integrate the update of the incidence matrix. This hybrid similarity matrix allows us 
to have the shortest path between concepts. 
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Algorithm 3. For optimizing the shortest path between concepts 

Input: - Max: number of all concepts in the merging ontology OWL3, 

- Mat: integer matrix of size (max) x (max), 

- MI: it is a matrix of link between the concepts, equal to 1 if there is a link otherwise 
=J 

- MP: matrix of weight similarity between all concepts. 

Output: CS,, : Updating matrix of hybrid similarity of all paths from each start (d) to each 
arrived (a) 

Method: 

Let MI,,: union matrix between MI and MP 

for i=0 : Max do 

for j=0 : Max do 


if (MI up [i,j] ==) =) 

Mat[i,j] © 100000 

d€ 0 

while (d< Number of concepts) do 
begin 


Dijkstrada // da is instance of Dijkstra 
da set of all paths from start d to arrive a in Mat matrix 
fora =1 : Maxdo 
begin 
Let V a vector of all way from d to arrive toa 
Let N be an integer vector of size V 
patha& Ø //path is a set of node where d is the first node and a is the last 
node initialized at empty set. 
fori =0 : sizeof(N) do 
Begin 
N[i] €Get(Vv (i)) // receive value of V[i] 
patha&pathau {N[i]} 
end 
storagepathg, in the CS, matrix at line d and column a 
end 
d€d+1 
end while 


2.8. Computing the hybrid similarity 

The principle of computing similarity with node-based approaches such as Wu and Palmer, Reda is 
based on the idea that a shorter path between two nodes makes them more similar. Another point about these 
approaches is that the arcs represent uniform distances. Therefore, this approach has the disadvantage that all 
semantic links have the same weight, which imposes difficulty in defining and controlling the linking 


distances. This is the reason why we have chosen to hybrid these approaches with the weight of similarity to 
reinforce the semantic aspect. 


2.8.1. Wordnet Wu and Palmer (WWP) approach 

In this approach, we apply the same principle as wup as shown in Figure 6 but we replace the values 
of the semantic links, which are uniform and have the same weight in this method, with the weight similarity 
between each concept in owl3 computed in the Weight similarity matrix. After we use the hybrid matrix 
computed in 2.7 to update the shortest path between concepts. This modification allows highlighting the 
semantic aspect in the computing of the similarity. 


Z 


Figure 6. Example of a concept hierarchy in WWP 
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2.8.2. WRA approach (Wordnet Reda and Al) 

In this approach, we use the formula of Reda and Al, but with an updated computation of the 
distance between C1 and C2 as shown in (4). Considering that distsh (C1, C2) corresponds to the number of 
arcs, taking into account the weight of similarity between the concepts that must be crossed in the ontology to 
connect the concepts C1 and C2. 


; _ 1 
Sitiwra (Cr C2) = 1+ dist sn (C1,Co) (4) 


3. EXPERIMENTAL EVALUATION 

We performed our experiment between two ontologies. In the first one, we use Cmt, which contains 
29 concepts and 76 relations. The second one, ConfOF, contains 38 concepts and 107 relations. These two 
ontologies describe the conference domain. Our system analyzed and extracted the OWL constructors of the 
two ontologies in 20 seconds. 


3.1. Experimental results 

Some experimental evaluations of the proposed approach are described in this section. All of the 
tests are executed on a laptop with an Intel Core (TM) i3 Duo 2.30 GHz processor and 4GB of RAM running 
Windows 7 and Java Netbeans 8.2. We report the results of our experiment by presenting evaluation 
measures and comparisons between node-based approaches and our approach. Table 1 describes comparison 
between the performances of our WWP and WRA measures with two popular measure of similarity WuP and 
Reda. From the analysis, it is clear that the suggested approach produces better results than other methods for 
the sam two ontologies used which are cmt and con-of from the conference domain. Figure 7 represente two 
graphs the first one comparaison between the performances of our WWP measures with WuP and the second 
one WRA measures with Reda. The number of arcs has increased at a rate of 9% for this experimentation. 
This is due to the impact of integrating the weight of similarity in the computation of the shortest path 
between the concepts. 
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Figure 7. Graph of comparaisonbetween the performances of our WWP and WRA measures with 
WuP and Reda 
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3.2. Evaluation measures 

Although there are many evaluation methods available today, the cohesion and density evaluation 
methods are used in this paper. Figures 8 and 9 show the impact of cohesion and density between the proposed 
approach and other methods (Wu and Palmer and Reda and A1). We compute the density and the cohesion 
between pairs of concepts (Ci, Cj) by treating these concepts as context units. Concerning the density, we use 
the probability of terms appearing in these units as well as explicit semantic relations to determine the density. 
Comparing the results obtained, we notice that the density is higher in our approach than in the other methods as 
shown in Figure 8. The degree of relatedness of OWL concepts, which are semantically related by the 
property’s relatedness of items in ontologies, is referred to as cohesion, and we notice that the cohesion in our 
approach is stronger than Wu and Palmer's and Reda and Al's approaches as shown in Figure 9. 


Table 1. Comparaison between the performances of our WWP and WRA methods with WuP and Reda 


No Threshold Wu and Palmer Reda and Al Wwp Wra 
(%) (%) (%) (%) 
1 0.10 48.911 74.733 76.225 76.225 
2 0.20 48.911 16.333 76.225 56.261 
3 0.30 48.911 11.887 76.225 30.217 
4 0.40 40.108 7.894 62.159 10.254 
5 0.50 35.571 0.725 58.711 0.725 
6 0.60 16.878 0.725 28.13 0.725 
7 0.70 0.544 0.725 4.264 0.725 
8 0.79 0.544 0.725 0.544 0.725 
9 0.89 0.0 0.725 0.0 0.725 
10 0.99 0.0 0.725 0.0 0.725 
Density graph Density graph 
3 - 6 
= 2,5 + 5 
£2 D4 
£15 4 @ 3 | 
Â 1 ——Wu and Palmer (%) E > | i ——Reda and al (%) 
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YO 0 8 OD DB 4  O 0 a ———I— 
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Threshold 
Threshold 
Figure 8. Graph evaluating the density 
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Figure 9. Graph evaluating the cohesion 


4. CONCLUSION 

In this paper, we present a system to compute the semantic similarity between two different 
ontologies but in the same domain. These ontologies are represented in OWL. The main objective of this 
paper was to demonstrate the impact of integrating the weight of similarity using Wordnet between concepts 
in node-based approaches. This allowed us to optimize the shortest path and to reinforce the semantic factor 
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between these concepts. We compared our two methods (WWP and WRA) with the methods (WuP and 
Reda) respectively, and the results obtained are encouraging. For future work, there are some possible steps, 
which we can focus on. One of these is the merging ontology algorithm. Another direction to explore is how 


this measure influences research effectiveness on the same ontologies but in different languages, french and 
Arabic. 
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